Fix flaky maxSubmitRetries test #227

alecgibson · 2018-07-19T14:50:29Z

This is a slightly speculative fix for a test that fails intermittently
on sharedb-mongo. I believe these intermittent failures are due to a
race condition in a concurrency test.

The test works by attempting to fire two commits off at the same time,
and hoping that one of them is committed just before the other, so that
a SubmitRequest.retry is triggered whilst the maxSubmitRetries is
set to 0, resulting in an error that is expected.

However, I believe it's possible for these commits to (in some cases)
happen sequentially rather than concurrently, and fail to error.

This change attempts to force them into this retry condition by:

Catching both ops in the commit middleware, just before they're
about to be committed (and hit a retry if applicable)
Waiting until both ops have reached this state
Triggering the first op's commit
Then in the callback of that op, triggering the second op's commit
The second op should now find that the first op has beaten it to
committing, and trigger a retry

coveralls · 2018-07-19T14:52:20Z

Coverage increased (+0.02%) to 96.514% when pulling e9ff681 on alecgibson:fix-flaky-test into 762e05d on share:master.

coveralls · 2018-07-19T14:52:21Z

Coverage increased (+0.6%) to 97.101% when pulling 8332411 on alecgibson:fix-flaky-test into 762e05d on share:master.

This is a slightly speculative fix for a test that fails intermittently on `sharedb-mongo`. I believe these intermittent failures are due to a race condition in a concurrency test. The test works by attempting to fire two commits off at the same time, and hoping that one of them is committed just before the other, so that a `SubmitRequest.retry` is triggered whilst the `maxSubmitRetries` is set to `0`, resulting in an error that is expected. However, I believe it's possible for these commits to (in some cases) happen sequentially rather than concurrently, and fail to error. This change attempts to force them into this retry condition by: - Catching both ops in the `commit` middleware, _just_ before they're about to be committed (and hit a `retry` if applicable) - Waiting until both ops have reached this state - Triggering the first op's `commit` - Then in the callback of that op, triggering the second op's `commit` - The second op should now find that the first op has beaten it to committing, and trigger a `retry`

See [this issue][1]. [1]: #226 (comment)

ericyhwang

Good sleuthing!

I believe that the second client's commit will hit the retry case as long as both clients get the same snapshot back from backend.db.getSnapshot in SubmitRequest.submit:

https://github.com/share/sharedb/blob/v1.0.0-beta.8/lib/submit-request.js#L45

Delaying the sending of commits is a pretty neat way of doing it, and I like how clean it ended up being too. 👍

ericyhwang · 2018-07-19T20:35:28Z

test/client/submit.js

+        });
+        doc.submitOp({p: ['age'], na: 2}, function (error) {
+          if (error) return done(error);
+          doc2Callback();


I think you can move this up to just below docCallback(), as by that point both clients will have already fetched the snapshot in submit and gone through apply.

Having the doc2Callback() call up there makes it a bit more obvious that the middleware is deferring the first callback until the second one is ready, and calling both in sequence.

Hmm. But isn't the commit hook just before they've been committed to the db? ie there's still a potential race condition if one of them gets committed before the other. That is, we don't know which callback will finish first, and get forced into retry...? This way we guarantee that doc will always be committed before doc2.

(That could be mitigated by putting back the old shared callback with a count, but I find that sort of thing a bit sketchy; it's like admitting that you're not really in control of your own test.)

That is, we don't know which callback will finish first, and get forced into retry...? This way we guarantee that doc will always be committed before doc2.

(That could be mitigated by putting back the old shared callback with a count, but I find that sort of thing a bit sketchy; it's like admitting that you're not really in control of your own test.)

Got it, you want to sequence the actual commits so you know 1 goes before 2. Makes sense, thanks for explaining!

ericyhwang · 2018-07-19T20:36:08Z

test/client/submit.js

-            done();
+        var docCallback;
+        var doc2Callback;
+        backend.use('commit', function (request, callback) {


I and future contributors (like future you!) would appreciate it if you added some comments here about why this is needed.

ericyhwang · 2018-07-19T23:43:29Z

Forgot to drop this note for the future - the failing test and its output:
https://travis-ci.org/share/sharedb-mongo/jobs/405812496#L1163

  1) db client submit submits fail above the backend.maxSubmitRetries threshold:
     Uncaught Error: expected undefined to be truthy
      at Assertion.assert (node_modules/expect.js/index.js:96:13)
      at Assertion.ok (node_modules/expect.js/index.js:115:10)
      at cb (node_modules/sharedb/test/client/submit.js:525:25)
      at callEach (node_modules/sharedb/lib/client/doc.js:905:7)
      at Doc._clearInflightOp (node_modules/sharedb/lib/client/doc.js:891:16)
      at Doc._opAcknowledged (node_modules/sharedb/lib/client/doc.js:836:8)
      at Doc._handleOp (node_modules/sharedb/lib/client/doc.js:286:10)
      at Connection.handleMessage (node_modules/sharedb/lib/client/connection.js:243:20)
      at StreamSocket.socket.onmessage (node_modules/sharedb/lib/client/connection.js:125:18)
      at node_modules/sharedb/lib/stream-socket.js:59:12
      at _combinedTickCallback (internal/process/next_tick.js:131:7)
      at process._tickCallback (internal/process/next_tick.js:180:9)

The failure can be reproduced artificially by wrapping the doc2.submitOp call in a very short setTimeout, to simulate the doc1 commit going through very quickly or the doc2 commit's snapshot fetch being a bit slow.

ericyhwang

Thanks for adding the explanations to the test!

I can merge this, but I can't publish it to NPM since I'm not a maintainer on the sharedb NPM package.

@nateps - Can you please handle merging, versioning, and publishing this? This will unblock some other PRs on e.g. sharedb-mongo that include and run this test.

ericyhwang · 2018-07-20T17:33:06Z

.travis.yml

@@ -5,6 +5,6 @@ node_js:
  - "8"
  - "6"
  - "4"
-script: "npm run jshint && npm run test-cover"
+script: "ln -s .. node_modules/sharedb; npm run jshint && npm run test-cover"


Dropping another note for the future, this is to work around the circular dependency issue between sharedb and sharedb-mingo-memory:
#226 (comment)

I'm OK with this workaround for now to unblock the tests as the workaround is isolated to Travis, until we figure out a better solution in the linked issue above.

Actually, maybe it's better to get the #226 merged first, as it also updates the Travis config's Node versions to reflect latest + LTE versions.

Sure. Happy to wait and rebase. Would just be nice to get this through to unblock other PRs.

nateps · 2018-07-23T20:52:06Z

Thanks for the contribution! Fixing flakey tests always brings me great joy. 💥

alecgibson mentioned this pull request Jul 19, 2018

Reduce test flakiness share/sharedb-mongo#65

Merged

Alec Gibson added 2 commits July 19, 2018 15:55

Work around circular sharedb and sharedb-mingo-memory dependency

032eb98

See [this issue][1]. [1]: #226 (comment)

ericyhwang reviewed Jul 19, 2018

View reviewed changes

Document race condition test fix

e9ff681

ericyhwang approved these changes Jul 20, 2018

View reviewed changes

Merge branch 'master' into fix-flaky-test

08eb790

nateps merged commit 632fad3 into share:master Jul 23, 2018

alecgibson deleted the fix-flaky-test branch July 23, 2018 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky maxSubmitRetries test #227

Fix flaky maxSubmitRetries test #227

alecgibson commented Jul 19, 2018

coveralls commented Jul 19, 2018 •

edited

Loading

coveralls commented Jul 19, 2018

ericyhwang left a comment

ericyhwang Jul 19, 2018

alecgibson Jul 20, 2018

alecgibson Jul 20, 2018

ericyhwang Jul 20, 2018

ericyhwang Jul 19, 2018

alecgibson Jul 20, 2018

ericyhwang commented Jul 19, 2018

ericyhwang left a comment

ericyhwang Jul 20, 2018

ericyhwang Jul 20, 2018

alecgibson Jul 21, 2018

nateps commented Jul 23, 2018

Fix flaky maxSubmitRetries test #227

Fix flaky maxSubmitRetries test #227

Conversation

alecgibson commented Jul 19, 2018

coveralls commented Jul 19, 2018 • edited Loading

coveralls commented Jul 19, 2018

ericyhwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericyhwang commented Jul 19, 2018

ericyhwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nateps commented Jul 23, 2018

coveralls commented Jul 19, 2018 •

edited

Loading